What is geoconnex.us? It is a framework for providers of water data to publish structured, linked metadata in a manner such that web crawlers can organize this metadata into a knowledge graph. This metadata should be formatted as JSON-LD. This knowledge grpah can be leveraged to create a wide array of information products to answer innumerable water-related questions. This vignette serves as a mockup of what navigating through the Geoconnex knowlege graph might look like on build-out.
The geoconnex.us framework includes a large and growing catalog of “community reference features” – neutral internet representations of real-world locations and areas that can be sed by water data providers to describe what their published data is about. Many of these reference features are available from the website https://info.geoconnex.us, which is powered by PyGeoAPI, a python implementation of OGC API-Features. This API enables programmtic interaction with spatial features on the web.
What’s available from info.geoconnex.us? We have collated a variety of common hydrologic and administrative locations and boundaries, and will continue to add more.
collection_url <- "https://info.geoconnex.us/collections"
collections <- jsonlite::fromJSON(collection_url)
knitr::kable(select(collections$collections, title, description))
| title | description |
|---|---|
| HU02 | Two-digit Hydrologic Regions |
| HU04 | Four-digit Hydrologic Subregion |
| HU06 | Six-digit Hydrologic Basins |
| HU08 | Eight-digit Hydrologic Subbasins |
| HU10 | Ten-digit Watersheds |
| National Aquifers | National Aquifers of the United States |
| Reference Gages | US Reference Stream Gage Monitoring Locations |
| States | U.S. States |
| Counties | U.S. Counties |
| American Indian/Alaska Native Areas/Hawaiian Home Lands (AIANNH) | Native American Lands |
| Core-based statistical areas (CBSA) | U.S. Metropolitan and Micropolitan Statistical Areas |
| Urban Areas | Urbanized Areas and Urban Clusters (2010 Census) |
| Places | U.S. legally incororated and Census designated places |
| Public Water Systems | U.S. Public Water Systems |
Let’s use New Mexico as our area of interest.
nm_url <- "https://info.geoconnex.us/collections/states/items?STUSPS=NM"
nm <- sf::read_sf(nm_url)
mapview::mapview(list(`New Mexico` = nm))
The search above gave us one state that we can retrieve by it’s ID. Below, we grab its JSON-LD format, and print two versions of what can be interpreted, the raw JSON-LD and the “flattened” form that interprets the fields according to published linked data vocabularies such as https://schema.org.
accept_jsonld <- httr::add_headers("Accept" = "application/ld+json")
nm_ld <- rawToChar(httr::GET(nm$id, config = accept_jsonld)$content)
prettify(nm_ld)
## {
## "@context": [
## {
## "schema": "https://schema.org/",
## "geojson": "https://purl.org/geojson/vocab#",
## "Feature": "geojson:Feature",
## "FeatureCollection": "geojson:FeatureCollection",
## "Point": "geojson:Point",
## "bbox": {
## "@container": "@list",
## "@id": "geojson:bbox"
## },
## "coordinates": {
## "@container": "@list",
## "@id": "geojson:coordinates"
## },
## "features": {
## "@container": "@set",
## "@id": "geojson:features"
## },
## "geometry": "geojson:geometry",
## "id": "@id",
## "properties": "geojson:properties",
## "type": "@type"
## },
## {
## "schema": "https://schema.org/",
## "NAME": "schema:name",
## "census_profile": {
## "@id": "schema:subjectOf",
## "@type": "@id"
## }
## }
## ],
## "type": "Feature",
## "properties": {
## "fid": 14,
## "STATEFP": "35",
## "STATENS": "00897535",
## "AFFGEOID": "0400000US35",
## "GEOID": "35",
## "STUSPS": "NM",
## "NAME": "New Mexico",
## "LSAD": "00",
## "uri": "https://geoconnex.us/ref/states/35",
## "census_profile": "https://data.census.gov/cedsci/profile?g=0400000US35",
## "id": "https://geoconnex.us/ref/states/35"
## },
## "id": "https://geoconnex.us/ref/states/35"
## }
##
nm_ld <- jsonld::jsonld_flatten(nm_ld)
nm_ld
## [
## {
## "@id": "https://geoconnex.us/ref/states/35",
## "@type": [
## "https://purl.org/geojson/vocab#Feature"
## ],
## "https://purl.org/geojson/vocab#properties": [
## {
## "@id": "https://geoconnex.us/ref/states/35"
## }
## ],
## "https://schema.org/name": [
## {
## "@value": "New Mexico"
## }
## ],
## "https://schema.org/subjectOf": [
## {
## "@id": "https://data.census.gov/cedsci/profile?g=0400000US35"
## }
## ]
## }
## ]
nm_ld <- fromJSON(nm_ld)
This gives us some basic information. The @id here is especially useful. Note that the @id (the subject of all the triples in the document) is the same as the id of the State GeoJSON we mapped above and used to retrieve this JSON-LD document.
Notice that we can get a name using a linked data property https://schema.org/name here. This is an example of structured data that would allow automated creation of information products that is an aim of geoconnex.us.
nm_feature <- sf::read_sf(nm_ld$`@id`)
nm_feature_name <- nm_ld$`https://schema.org/name`[[1]]$`@value`
# Could make html links clickable in mapview
nm_map_layer <- setNames(list(nm_feature), nm_feature_name)
mapview::mapview(nm_map_layer)
Now let’s pivot to look at some data that might be of particular interest. Let’s say I’m most interested in Las Vegas, New Mexico. Where is this city, and what is its boundary? Let’s search for this place in the U.S. Census Places feature collection within NM (FIPS Code 35)
## nm
NM_places <- sf::read_sf("https://info.geoconnex.us/collections/places/items?STATEFP=35&limit=10000000")
LasVegas <- filter(NM_places, NAME=="Las Vegas")
mapview::mapview(LasVegas)
print(LasVegas$uri)
## [1] "https://geoconnex.us/ref/places/3539940"
By printing the uri, we can see the Uniform Resource Identifier (URI) for the city of Las Vegas, NM. Now, what if I want to see which Public Water Systems (PWS) serve this city? Let’s see what Queryables the PWS reference features offer:
knitr::kable(fromJSON("https://info.geoconnex.us/collections/pws/queryables"))
|
We can query by CITY_SERVED_uri. Let’s try it.
LasVegasURI <- "https://geoconnex.us/ref/places/3539940"
LasVegas_PWS <- sf::read_sf(paste0("https://info.geoconnex.us/collections/pws/items?CITY_SERVED_uri=",LasVegasURI))
mapview::mapview(LasVegas_PWS, color="blue") + mapview::mapview(LasVegas, col.regions="green")
Clicking around, we can find what seems to be the main water provider, PWSID NM3518025 (@id https://geoconnex.us/ref/pws/NM3518025). The embedded JSON-LD harvested includes a “subjectOf” link to the USEPA Safe Drinking Water Information System (SDWIS), indicating that this EPA page is about this particular water system. As geoconnex.us expands and developed, every community reference feature such as this one should have many links that can direct us to all kinds of metadata and data published by other organiztions that are relevant to the feature.
lv_pws_ld <- rawToChar(httr::GET("https://geoconnex.us/ref/pws/NM3518025", config = accept_jsonld)$content)
lv_pws_ld <- jsonld::jsonld_flatten(lv_pws_ld)
lv_pws_ld
## [
## {
## "@id": "https://geoconnex.us/ref/pws/NM3518025",
## "@type": [
## "https://purl.org/geojson/vocab#Feature"
## ],
## "https://purl.org/geojson/vocab#properties": [
## {
## "@id": "https://geoconnex.us/ref/pws/NM3518025"
## }
## ],
## "https://schema.org/geoIntersects": [
## {
## "@id": "https://geoconnex.us/ref/places/3539940"
## }
## ],
## "https://schema.org/geoWithin": [
## {
## "@id": "https://geoconnex.us/ref/states/35"
## }
## ],
## "https://schema.org/isBasedOn": [
## {
## "@id": "https://catalog.newmexicowaterdata.org/en/dataset/public-water-supply-areas"
## }
## ],
## "https://schema.org/name": [
## {
## "@value": "LAS VEGAS (CITY OF)\r\n"
## }
## ],
## "https://schema.org/subjectOf": [
## {
## "@id": "https://enviro.epa.gov/enviro/sdw_report_v3.first_table?pws_id=NM3518025&state=NM&source=Surface%20water&population=18044"
## }
## ]
## }
## ]
Now let’s find some more of this other data published by other organizations, not just the reference features. First, let’s widen our scope a bit. Let’s say we’re interested in water data that is in the same HUC8 as Las Vegas.
hu08_url <- paste0("https://info.geoconnex.us/collections/hu08/items?bbox=",
paste(sf::st_bbox(nm_feature), collapse = ","))
hu08 <- sf::read_sf(hu08_url)
hu08 <- hu08[LasVegas,]
hu08$NAME
## [1] "Pecos Headwaters"
mapview::mapview(list("Pecos Headwaters" = hu08, "Las Vegas" = LasVegas),col.regions=c("green","blue"))
Just working with spatial intersections from the HUC8 reference feature collection, we find the relevant HUC8 is the Pecos Headwaters. Now, we can make use of all data within the Geoconnex.us system that has been published by all organizations. On build-out, we will be able to interact with a knowledge graph that is being continually updated by web crawlers that harvest JSON-LD from all participating provider websites. Parts of this knowledge graph, in turn, would be re-presented in the reference features JSON-LD. In this example, the JSON-LD from the page for the Pecos Headwaters (https://geoconnex.us/ref/hu08/13060001) might include links with the relationship label “geoContains” for every water data site about a location within that HUC8. For this exercise, we will load a pre-processed list of dataframes with the same information.
(Can take this further by making an actual graph, putting in a triple store, and interacting with SPARQL — may be more trouble than worth)
Layers can be turned on and off with the layers button on the top left for greater readability. That’s a lot of data though!
load("graph.rds")
names(within_hu08_13060001)
## [1] "HUC8" "PWS"
## [3] "NMED-SDWIS_Sample_Points" "NMED-NPDES"
## [5] "NMBGMR-Wells" "Reference Gages"
## [7] "WaDE Sites"
mapview::mapview(within_hu08_13060001,
col.regions=c("green",
"blue",
"violet",
"red",
"orange",
"black",
"white"),
cex = c(3,
3,
9,
10,
10,
8,
4))
It looks like, in addition to the HUC8 and the PWS, we also have data from:
A key part of the philosophy of geoconnex is that metadata is published independently by organizations and harvested automatically. Hovering around these various features on the map, we can that all sites have a uri (@id) that begins with “https://geoconnex.us/”. However, if one follows these links, one is taken to different websites hosted on different servers by different organizations that are not aware of each other. For example, https://geoconnex.us/nmwdi/nmbgmr/wells/WL-0183 redirects to http://wells.newmexicowaterdata.org/collections/nmbgmr_wells/items/WL-0183 , a PyGeoAPI instance operated by the New Mexico Water Data Initiative. Meanwhile https://geoconnex.us/wade/sites/NM_146344 redirects to https://wade-test.geoconnex.us/collections/WaDE/items/NM_146344, a separate PyGeoAPI instance operated by the Water Data Exchange. However, since both data systems provide JSON-LD markup, their metadata can be harvested automatically to create the data discovery workflow visualized here.
This allows us to browse through the metadata. For example, perhaps we are only interested the SDWIS points and the NPDES permit for the Las Vegas drinking water system:
npdes <- within_hu08_13060001$`NMED-NPDES`
lv <- filter(within_hu08_13060001$PWS,
CITY_SERVED_uri == "https://geoconnex.us/ref/places/3539940")
lv <- st_as_sfc(st_bbox(lv))
npdes <- npdes[lv,]
lv <- filter(within_hu08_13060001$PWS,uri=="https://geoconnex.us/ref/pws/NM3518025")
mapview::mapview(list("Las Vegas NPDES Points"=npdes), col.regions="red") + mapview::mapview(list("Las Vegas PWS"=lv))
The true power of the system comes when organizations link data to the structured metadata they publish and that geoconnex.us can harvest. For example, the southernmost NPDES point (corresponding to the Las Vegas Wastewater Treatment Plant) includes linked data in the “sta” variable, referring to the OGC SensorThings API, at this URL: https://st.newmexicowaterdata.org/FROST-Server/v1.1/Things(2682)?$expand=Datastreams/Observations. Parsing the JSON response is simple. Thus, we can call the actual underlying data from NMED from the harvested metadata, learning that the Las Vegas Wastewater Treatment Plant has an NPDES permit that is effective as of May 01, 2017, and expires April 30, 2022.
data <- as.data.frame(fromJSON(npdes[2,]$sta)$Datastreams$Observations)
print('as.data.frame(fromJSON(npdes[2,]$sta)$Datastreams$Observations)')
## [1] "as.data.frame(fromJSON(npdes[2,]$sta)$Datastreams$Observations)"
knitr::kable(select(data,result,resultTime))
| result | resultTime |
|---|---|
| expiration | 2022-04-30T00:00:00.000Z |
| effective | 2017-05-01T00:00:00.000Z |
Let’s say we’re interested in conditions downstream of this WWTP. We can use the USGS Network-Linked Data Index to discover relevant data in that system. We can use the WWTP (@id https://geoconnex.us/nmwdi/nmbgmr/wells/NM0028827) as a starting point, and use the NLDI service to find the downstream mainstem.
geom_site <- sf::read_sf("https://geoconnex.us/nmwdi/nmbgmr/wells/NM0028827")
point <- sf::st_sfc(geom_site$geometry)
geom_site <- geom_site$geometry[[1]]
nldi_point<-nhdplusTools::discover_nhdplus_id(point)
# nldiURL<-"DM = "https://labs.waterdata.usgs.gov/api/nldi/linked-data/nwissite/USGS-08279500/navigate/DM?distance=100"
nldi_query <- URLencode(paste0('https://labs.waterdata.usgs.gov/api/nldi/linked-data/comid/position?f=json&coords=POINT(',geom_site[1],' ',geom_site[2],')'))
mainstem <- sf::read_sf(nldi_query)
mainstem <- sf::read_sf(paste0(mainstem$navigation,"/DM/flowlines?distance=250"))
mapview::mapview(list("Pecos Headwaters"=within_hu08_13060001$HUC8), col.regions="green") +
mapview::mapview(list("Las Vegas NPDES Points"=npdes), col.regions="red") + mapview::mapview(list("Las Vegas PWS"=lv), col.regions="black") +
mapview::mapview(list("Downtream Mainstem"=sf::st_geometry(mainstem)[within_hu08_13060001$HUC8,]), hcl.colors="blue")
stream_gages_url <- paste0("https://info.geoconnex.us/collections/gages/items?limit=10000&bbox=",
paste(sf::st_bbox(hu08), collapse = ","))
stream_gages <- sf::st_intersection(sf::read_sf(stream_gages_url),
hu08)
mapview::mapview(stream_gages) + mapview::mapview(hu08, colregion="green")
Browsing around these sites a bit, let’s use the PECOS RIVER NEAR PUERTO DE LUNA, NM. Since the reference gages include network locations on the NHDPlusV2, we can use them with the Hydro Network Linked Data Index..
We can use the R package, nhdplusTools to interact with the NLDI. Below, we get the basin boundary, mainstem, and all sites in the Western States Water Council Water Data Exchange upstream of this stream gage.
site <- filter(stream_gages, name == "PECOS RIVER NEAR PUERTO DE LUNA, NM")
nldi_feature <- list(featureSource = "nwissite",
featureID = paste0("USGS-", site$provider_id))
basin <- nhdplusTools::get_nldi_basin(nldi_feature)
mainstem <- nhdplusTools::navigate_nldi(nldi_feature, "UM", "flowlines", distance_km = 500)
wade <- nhdplusTools::navigate_nldi(nldi_feature, "UM", "wade", distance_km = 500)
mapview::mapview(wade, col.regions="white") + mapview::mapview(hu08, col.regions="green") +
mapview::mapview(sf::st_geometry(mainstem), col.regions="blue")
USGS, dblodgett@usgs.gov↩︎
Internet of Water, kyle.onda@duke.edu↩︎